home *** CD-ROM | disk | FTP | other *** search
- From: James A. Woods <genrad!decvax!decwrl!jaw@RIACS.ARPA>
- Subject: egrep - More Pep for Boyer-Moore Grep
- Newsgroups: mod.sources
- Approved: jpn@panda.UUCP
-
- Mod.sources: Volume 4, Issue 47
- Submitted by: James A. Woods <ames!jaw@RIACS.ARPA>
-
-
- as advertised in net.unix. -- ames!jaw
-
- ----------
- #! /bin/sh
- # This is a shell archive, meaning:
- # 1. Remove everything above the #! /bin/sh line.
- # 2. Save the resulting text in a file.
- # 3. Execute the file with /bin/sh (not csh) to create the files:
- # pep4grep.1
- # pep4grep.2
- # Makefile
- # egrep.c
- # compat-sys5.c
- # This archive created: Wed Apr 2 18:19:14 1986
- export PATH; PATH=/bin:$PATH
- echo shar: extracting "'pep4grep.1'" '(6104 characters)'
- if test -f 'pep4grep.1'
- then
- echo shar: will not over-write existing file "'pep4grep.1'"
- else
- cat << \SHAR_EOF > 'pep4grep.1'
- >From postnews Tue Mar 18 18:04:08 1986
- Subject: More Pep for Boyer-Moore Grep (part 1 of 2)
- Newsgroups: net.unix
-
- # The chief defect of Henry King
- Was chewing little bits of string.
-
- -- Hilaire Belloc, Cautionary Tales [1907]
-
- # Attempt the end, and never stand to doubt
- Nothing's so hard but search will find it out.
-
- -- Robert Herrick, Hesperides [1648]
-
- The world does not need another 'grep' variant. And so, what is this
- we offer? On the surface, the exact same 'egrep' actually, but underneath,
- a swift Boyer-Moore hybrid, in C, which can beat assembler versions utilizing
- microcoded string search instructions. The offering, designed in the
- Kernighanian sense to utilize the existing 'egrep' when it must, also
- makes use of Mr. Henry Spencer's regexp(3) functions in an unusual way.
- For the edification of those without on-line access to system source code,
- the vendor-supplied 'egrep' is left in a pristine state.
-
- With code now wending its way to mod.sources, we obtain the following
- results. Times (in seconds) are all measured on a VAX 11/750 system running
- BSD 4.2 on Fujitsu Eagles, although our 'egrep' has been run on the Sun 2,
- V7 Unix/PDP 11, Vaxen configured with System V, and, for added effect, the
- NASA Ames Cray 2.
-
- 200K bytes user sys notes
-
- (new) egrep astrian /usr/dict/words 0.4 0.5 implementation by "jaw"
- match " " 0.5 0.5 VAX-only (Waterloo)
- bm " " 1.1 0.6 Peter Bain's version 2
- (old) egrep " " 5.6 1.7 standard
-
- [note: the output here is the single word "Zoroastrian".]
-
- Aha, you quip -- this is all very fine for the 99 and 44/100's percent
- metacharacter-free world, but what about timing for shorter strings, character
- folding, as well as for the more interesting universe of extended regular
- expressions? Samples forthwith. (Egrep below refers to the new one, times for
- the /usr/bin code being about the same as above on most any pattern.)
-
- egrep zurich 0.4 0.5 0 words output
- egrep -i zuRich 0.4 0.5 1
- egrep -i zeus 0.6 0.6 1
- egrep -i zen 0.7 0.6 11
- bm zen 2.2 0.6 10
- egrep ZZ 0.8 0.6 0
- bm ZZ 3.0 0.7 0
- egrep -c Z 1.5 0.6 19
- bm -c Z 5.9 0.7 19
-
- Admittedly, most people (or programs) don't search for single characters,
- where Boyer-Moore is a bit slow, but it's important for the layered regular
- expression approach described herein. We might point out from the above that
- the popular "fold" option crippled by 'bm' costs little; it's only a slight
- adjustment of the precomputed "delta" table as well as a single character
- array reference in a secondary loop. Why has Bain claimed complexity for this?
- Also, the times show that the inner loop chosen for our code (modeled after
- the original speedup done by Boyer-Moore for the PDP 10) consistently betters
- the "blindingly fast" version by a factor of two to three. The tipoff was
- from previous paper studies (esp. Horspool, see header notes in code) noting
- that the algorithm should, when implemented efficiently, best typical microcode.
- Now it does.
-
- while ( (k += delta0 ( *k )) < strend )
- ; /* over 80% of time spent here */
-
- is the key (modulo precomputation tricks), and takes but three or four
- instructions on most machines.
-
- Basic method for regular expressions:
-
- (1) isolate the longest metacharacter-free pattern string via the
- "regmust" field provided by H. Spencer's regcomp() routine.
-
- (Non-kosher, but worth not re-inventing the wheel.
- v8 folks just might have to reverse-engineer Spencer's
- reverse-engineering to provide equivalent functionality.
- You see, there are many more sites running his code than v8.
- Besides, we enjoy using regexpr technology on itself.
-
- (2) for "short" input, submatching lines are passed to regexec().
-
- (3) for "long" input, start up a standard 'egrep' process via
- popen() or equivalent. Why not just use regexec()? Unfortunately
- for our application, Spencer's otherwise admirable finite-state
- automaton exhibits poor performance for complex expressions.
- Setting a threshold on input length, though not perfect, helps.
- If pipes on Unix were free, we'd use this way exclusively.
- Until then, we buy happiness for those who might
-
- egrep stuff /usr/spool/news/net/unix/*
-
- or on other directories full of short files.
-
- So,
- newegrep -i 'hoe.*g' words 1.2 1.1
- {shoestring,Shoenberg}
- newegrep '(a|b).*zz.*[od]$' words 1.5 1.1
- {blizzard,buzzword,palazzo}
- oldegrep 6.3 1.4
- but,
- {new,old}egrep -c '(first|second)' similar times (no isolate)
-
- Again, we stress that given the different nature of the simulations of the two
- nondeterministic reg. expr. state-machines (one functionless), cases can be
- "cooked" to show things in a bad light, so a hybrid is warranted.
- We can generally do better incorporating the Boyer-Moore algorithm directly
- into the AT&T code. For the last example, the abstraction
-
- (egrep first words &; egrep second words) | sort -u | wc
-
- ideally would work better on a parallel machine, but if you're expecting
- something as amazing in this draft as, say, Morwen B. Thistlethwaite's 52-move
- Rubik's Cube solution, you're in the wrong place.
-
- About options -- system V ones are supported (-c, -l, bonus -i for BSD);
- the 'egrep' here just hands off patterns to old code for things like -n, -b,
- -v, and multiple patterns. As a bone to throw to the enemies of the cat-v
- school, there is a -h (halt after printing first match), but we don't talk
- about it much. Multiple patterns can done ala 'bm' but laziness in the
- presence of lack of knowledge of where 'fgrep' wins has prevailed for version 1.
-
- Personally I feel that adapting ("internationalizing") the 'egrep' effort
- for two-byte Kanji is FAR more important than tweeking options or tradeoffs,
- so for you large-alphabet Boyer-Moore algorithm specialists, send ideas
- this way.
-
- Further historical/philosophical comments follow in the sequel.
-
- James A. Woods (ames!jaw)
- NASA Ames Research Center
-
- SHAR_EOF
- if test 6104 -ne "`wc -c < 'pep4grep.1'`"
- then
- echo shar: error transmitting "'pep4grep.1'" '(should have been 6104 characters)'
- fi
- fi
- echo shar: extracting "'pep4grep.2'" '(4623 characters)'
- if test -f 'pep4grep.2'
- then
- echo shar: will not over-write existing file "'pep4grep.2'"
- else
- cat << \SHAR_EOF > 'pep4grep.2'
- >From postnews Tue Mar 18 18:05:22 1986
- Subject: More Pep for Boyer-Moore Grep (part 2 of 2)
- Newsgroups: net.unix
-
- # "Gratiano speaks an infinite deal of nothing, more than any man in all
- of Venice. His reasons are as two grains of wheat hid in two bushels of
- chaff: you shall seek all day ere you find them, they are not worth
- the search." -- Shakespeare, Merchant of Venice
-
- ... or, part 2, "Reach out and Boyer-Moore Egrep Someone"
-
- Maybe you never use 'grep'. Then ignore this. But if you do, why not
- use the best algorithm? Serious addicts know that for unstructured yet
- stable text, B-trees are used for speed, or something like Lesk's nifty
- (and unavailable) 'grab' suite for inverted files are ways to go. Barring file
- inversion daemons for netnews and other ephemera, we are limited to the
- present improvements.
-
- Proper skeptics should question why a nearly I/O-bound program
- (but not for any CPU with less than the power of a 780, alas) should be
- made more so. The question was posed in B & M's classic 1978 CACM
- paper -- the answer then was to free up more CPU cycles for timesharing.
- Now, our motivations are more mundane (we won't have desktop 5 MIP machines
- for another year), but not only that, we've discovered that the Cray 2's
- standard 'egrep' is also very anemic, performing 8-12 times as worse as ours
- on simple patterns. For shame, especially since hearing of the rumor that
- certain group theorists have a search application ready for testing.
- Boyer-Moore could fill in until a Cray vectorizing C compiler shows up.
- Sheer speed for machines whose filesystems are cached in memory is nice too.
-
- A quick-and-dirty rundown of the debts to which the new hybrid pays
- now follows.
-
- Thompson, K. T. (CACM, November 1968):
- Regular Expression Search Algorithm. As usual, obvious
- once you understand it. The current 'egrep'. Still
- useful as a base. Abstracted by Aho/Ullman as Algorithm
- 9.1 in Design and Analysis of Computer Algorithms.
-
- Boyer/Moore:
- Not quite pre-Unix. Oh well. Modern designers should
- know better now, if they want their stuff to get out there.
- By the way, I haven't used delta2 (or 1) since the O(mn) case
- case doesn't come up too often. Sure Knuth stood on his head
- to better the linearity, but his proof had a bug in it until
- the 1980 SIAM J. Comput. retraction. Would you want to code
- something that even Knuth trips up on?
-
- Now to assuage nagging feelings that geneticists might want
- to search entire libraries of 9000-unit nucleotide protein
- sequences for ((AGCA|TTGCA).*TGC)|AGCT)?T?A+ or some nonsense
- which MIGHT be nonlinear, you would want delta2. So convince
- someone to do the Galil/Apostolico/Giancarlo 2n comparison
- worst case stuff. See egrep.c for reference.
-
- Gosper, W. (HAKMEM 1972):
- Gosper didn't get around to the Thompson-like machine until
- 1972 with HAKMEM. His PDP 10 code is nevertheless valiant.
- He is also (barely) credited with conceiving the backwards
- match idea independently. Where is he now?
-
- Morris/Pratt:
- Nice guys, but for this purpose, has-beens.
- Neat to see a hacker's triumph bury some theory.
-
- Horspool (Software Practice & Experience, 1980):
- Now here's a Canadian after the heart of things
- (perfect hashing, text compression, NP-complete
- code generation probs., etc.) Did some Amdahl
- timings to show that delta2 is not so hot.
- Knows about Search For Least Frequent Character First,
- which is useful for short patterns.
-
- {,e,f}grep man page:
- The laughable bugnote "but we do not know a single algorithm
- that spans a wide enough range of space-time tradeoffs"
- certainly presumes that there is no such thing as switching
- logic. How the 'grep' family got into a multiple-version
- mess is probably a Guy Harris story; 'egrep' looks like the
- winner, as its functionality is pretty much a superset of
- the other two. The K & P teaser (p. 105) offers hope for
- unification, but we see no difference with extant V8 code.
-
- "Not cited in the text" -- the sexy randomized Karp/Rabin string searcher
- (Sedgewick, Algorithms, or Karp's Turing Award Lecture), and the ribald classic
- Time Warps, String Edits, and Macromolecules -- The Theory and Practice
- of Sequence Comparison (Kruskal & Sankoff). Inquire within.
- Thanks for your patience,
-
- James A. Woods (ames!jaw)
- NASA Ames Research Center
-
- P.S.
- Current applications for Boyer-Moore code include modification of
- 'fastfind' for true speed, as well as substring search for 'grab', both
- benefiting from BM-style search thru incrementally-compressed files/indices.
-
- SHAR_EOF
- if test 4623 -ne "`wc -c < 'pep4grep.2'`"
- then
- echo shar: error transmitting "'pep4grep.2'" '(should have been 4623 characters)'
- fi
- fi
- echo shar: extracting "'Makefile'" '(593 characters)'
- if test -f 'Makefile'
- then
- echo shar: will not over-write existing file "'Makefile'"
- else
- cat << \SHAR_EOF > 'Makefile'
- # optional items for ENV:
- # -DBSD make -i work as in System V
- # -I. use regexp.h in current directory, not /usr/include
- # -DEGREP=path default /usr/bin/egrep
- # -DV7 invoke xread() for system time quirk
-
- ENV= -I. -DBSD
-
- # optional items for OBJ:
- # compat-sys5.o for V7 or BSD 4.2 systems w/no getopt(3) or string(3)
- # regexp.o if Henry Spencer's regexp(3) is not installed
- # regerror.o "
- # V8 people -- your regexp.h won't do
-
- OBJ= regexp.o regerror.o compat-sys5.o
-
- CFLAGS= -O -i $(ENV)
-
- egrep: egrep.o $(OBJ)
- cc $(CFLAGS) $(OBJ) egrep.o -o egrep
-
- install:
- mv egrep /usr/local
- SHAR_EOF
- if test 593 -ne "`wc -c < 'Makefile'`"
- then
- echo shar: error transmitting "'Makefile'" '(should have been 593 characters)'
- fi
- fi
- echo shar: extracting "'egrep.c'" '(11219 characters)'
- if test -f 'egrep.c'
- then
- echo shar: will not over-write existing file "'egrep.c'"
- else
- cat << \SHAR_EOF > 'egrep.c'
- /*
- Boyer/Moore/Gosper-assisted 'egrep' search, with delta0 table as in
- original paper (CACM, October, 1977). No delta1 or delta2. According to
- experiment (Horspool, Soft. Prac. Exp., 1982), delta2 is of minimal
- practical value. However, to improve for worst case input, integrating
- the improved Galil strategies (Apostolico/Giancarlo, Siam. J. Comput.,
- February 1986) deserves consideration.
-
- Method: extract longest metacharacter-free string from expression.
- this is done using a side-effect from henry spencer's regcomp().
- use boyer-moore to match such, then pass submatching lines
- to rexexp() for short input, or standard 'egrep' for long
- input. (this tradeoff is due to the general slowness of the
- regexp() nondeterministic machine on complex expressions,
- as well as the startup time of 'egrep' on short files.)
- alternatively, you may change the faster 'egrep' automaton
- to include boyer-moore directly.
- Future:
- beef up for multiple patterns ala bm/fgrep. can do fast -n
- via file rescan, but it's a luxury. adapt 'fastfind'.
- internationalize for kanji.
-
- James A. Woods Copyright (c) 1986
- NASA Ames Research Center
- */
- #ifdef V7
- #define BSD
- #define void int
- #endif
-
- #include <stdio.h>
- #include <ctype.h>
- #include <sys/types.h>
- #include <sys/stat.h>
- #include <regexp.h> /* must be henry spencer's version */
-
- #ifdef BSD
- #define strchr index
- #endif
- #ifndef EGREP
- #define EGREP "/usr/bin/egrep" /* prevent installation-dependent recursion */
- #endif
- #define BUFSIZE 8192
- #define PATSIZE 1000
- #define LARGE BUFSIZE + PATSIZE
- #define FSIZE 50000 /* algorithm tradeoff at this length (ad hoc) */
- #define NL '\n'
- #define EOS '\0'
-
- extern char *optarg;
- extern int optind;
-
- int cflag, iflag, eflag, fflag, lflag;
- int boyflag, rxflag;
- int hflag;
- int nfile, nsuccess;
- long nmatch;
-
- regexp *rspencer;
- char *pattern, *patboy;
- int delta0[256]; /* ascii only -- see note at gosper() */
- char cmap[256]; /* (un)folded characters */
- char str[BUFSIZE+2];
- char linetemp[BUFSIZE];
- char grepcmd[PATSIZE];
-
- struct stat stbuf;
- int fd;
- FILE *egout, *mcilroy(), *popen();
- char *strchr(), *strcpy(), *strncpy(), *strpbrk(), *malloc();
- char *fold(), *sys5fold();
-
- main ( argc, argv )
- int argc; char *argv[];
- {
- int c;
- int errflag = 0;
- int oldegrep = 0;
-
- while ( (c = getopt ( argc, argv, "bchie:f:lnv" ) ) != EOF )
-
- switch(c) {
-
- case 'f':
- fflag++;
- case 'b':
- case 'n':
- case 'v':
- oldegrep++; /* boyer-moore of little help here */
- continue;
- case 'c':
- cflag++;
- continue;
- case 'e':
- eflag++;
- pattern = optarg;
- continue;
- case 'h':
- hflag++; /* shh ... for newshounds */
- continue;
- case 'i':
- iflag++;
- continue;
- case 'l':
- lflag++;
- continue;
- case '?':
- errflag++;
- }
- argc -= optind;
- if ( errflag || ((argc <= 0) && !fflag) )
- oops ( "usage: egrep [-bcilnv] [-e exp] [-f file] [strings] [file]" );
- if ( !eflag ) {
- pattern = argv[optind++];
- argc--;
- }
- if ( oldegrep || (strchr ( pattern, '\n' ) != NULL) ) {
- execvp ( EGREP, argv );
- oops ( "can't exec old 'egrep'" );
- }
- if ( iflag ) {
- strcpy ( pattern, fold ( pattern ) );
- }
- if ( strpbrk ( pattern, "^$.[]()?+*|\\" ) == NULL ) { /* do boyer-moore only */
- boyflag++;
- rxflag = 0;
- patboy = pattern;
- }
- else {
- /*
- first compile a fake regexp to isolate longest
- metacharacter-free string
- */
- {
- char *dummyp;
- dummyp = malloc ( (unsigned) strlen ( pattern ) + 5 );
- sprintf ( dummyp, "(.)*%s", pattern );
- rspencer = regcomp ( dummyp );
- }
- if ( rspencer -> regmust == NULL ) { /* pattern too complicated */
- execvp ( EGREP, argv );
- oops ( "can't exec old 'egrep'" );
- }
- patboy = rspencer -> regmust;
- if ( (rspencer = regcomp ( pattern )) == NULL )
- oops ( "egrep: regcomp failure" );
- }
- gosper ( patboy );
- argv = &argv[optind];
- nfile = argc;
- if ( argc <= 0 ) { /* process stdin */
- if ( lflag )
- exit ( 1 );
- fd = 0;
- if ( !boyflag )
- if ( (egout = mcilroy ( (char *) NULL )) == NULL )
- oops ( "egrep: no processes" );
- boyermoore ( (char *) NULL, patboy );
- if ( !boyflag )
- pclose ( egout );
- }
- else {
- while ( --argc >= 0 ) {
- if ( (fd = open ( *argv, 0 )) <= 0 ) {
- fprintf ( stderr, "egrep: can't open %s\n", *argv );
- nsuccess = 2;
- argv++;
- continue;
- }
- if ( !boyflag ) {
- fstat ( fd, &stbuf );
- if ( stbuf.st_size < FSIZE )
- rxflag = 1;
- else {
- rxflag = 0;
- if ( (egout = mcilroy ( *argv )) == NULL )
- oops ( "egrep: no processes" );
- }
- }
- boyermoore ( *argv, patboy );
- if ( !boyflag && !rxflag ) {
- fflush ( egout );
- pclose ( egout );
- }
- close ( fd );
- argv++;
- }
- }
- exit ( (nsuccess == 2) ? 2 : (nsuccess == 0) );
- }
-
- boyermoore ( file, pattern ) /* "reach out and boyer-moore search someone" */
- char *file, *pattern; /* -- soon-to-be-popular bumper sticker */
- {
- register char *k, *strend, *s;
- register int j;
- int patlen;
- char *t;
- char *gotamatch();
- int nleftover, count;
-
- patlen = strlen ( pattern );
- nleftover = nmatch = 0;
-
- #ifdef V7
- #define read xread
- #endif
- while ( (count = read ( fd, str + nleftover, BUFSIZE - nleftover )) > 0 ) {
-
- #undef read
- count += nleftover;
- if ( count != BUFSIZE && fd != 0)
- str[count++] = NL; /* insurance for broken last line */
- str[count] = 0;
- for ( j = count - 1; str[j] != NL && j >= 0; )
- j--;
- if ( j < 0 ) { /* break up long line */
- str[count] = NL;
- str[++count] = EOS;
- strend = str + count;
- linetemp[0] = EOS;
- nleftover = 0;
- }
- else { /* save partial line */
- strend = str + j;
- nleftover = count - j - 1;
- strncpy ( linetemp, str + j + 1, nleftover );
- }
- k = str + patlen - 1;
-
- for ( ; ; ) {
- /*
- for a large class of patterns, upwards of 80% of
- match time is spent on the next line.
- we beat existing microcode (vax 'matchc') this way.
- */
- #ifndef V7
- while ( (k += delta0[*(unsigned char *)k]) < strend )
- ;
- #else
- while ( (k += delta0[*(char *)k & 0377]) < strend )
- ;
- #endif
- if ( k < (str + LARGE) )
- break;
- k -= LARGE;
-
- j = patlen - 1;
- s = k - 1;
- while ( cmap[*s--] == pattern[--j] )
- ;
- /*
- delta-less shortcut for literati, but
- short shrift for genetic engineers.
- */
- if ( j >= 0 )
- k++;
- else { /* submatch */
- t = k;
- s = k - patlen + 1;
- do
- ;
- while ( *s != NL && --s >= str );
- k = s + 1; /* at line start */
-
- if ( boyflag ) {
- if ( (k = gotamatch ( file, k )) == NULL )
- return;
- }
- else if ( !rxflag ) { /* fob off to egrep */
- do
- putc ( *k, egout );
- while ( *k++ != NL );
- }
- else { /* regexec() wants EOS */
- s = t;
- do
- ;
- while ( *s++ != NL );
- *--s = EOS;
-
- if ( regexec ( rspencer, ((iflag) ? fold ( k ) : k) ) == 1 ) {
- *s = NL;
- if ( gotamatch ( file, k ) == NULL )
- return;
- }
- *s = NL;
- k = s + 1;
- }
- if ( k >= strend )
- break;
- }
- }
- strncpy ( str, linetemp, nleftover );
- }
- if ( cflag && (boyflag || rxflag) ) {
- if ( nfile > 1 )
- printf ( "%s:", file );
- printf ( "%ld\n", nmatch );
- }
- }
-
- FILE *
- mcilroy ( file ) /* open a pipe to old 'egrep' for long files, */
- char *file; /* ... where regexp() might be inefficient */
- {
- #ifdef BSD
- int iflagsave = 0;
- char *patsave;
-
- if ( iflag ) {
- iflagsave = 1;
- iflag = 0;
- patsave = pattern;
- pattern = sys5fold ( pattern ); /* uncripple -i */
- }
- #endif
- /*
- -l via -c is sneaky, but we're short a good prwopen()
- */
- if ( lflag && !cflag )
- sprintf ( grepcmd, "%s -c %s '%s' | sed -n '/^0$/!s|.*|%s|p'",
- EGREP, (iflag ? "-i" : " "), pattern, file );
- else if ( nfile <= 1 )
- sprintf ( grepcmd, "%s %s %s '%s'", EGREP,
- (cflag ? "-c" : " "), (iflag ? "-i" : " "), pattern );
- else
- sprintf ( grepcmd, "%s %s %s '%s' | sed -n 's|^|%s:|p'", EGREP,
- (cflag ? "-c" : " "), (iflag ? "-i" : " "), pattern, file );
- /*
- we thank mr. thompson for an especial ndfa simulation.
- (consult cacm, june 1968, or aho et. al., design & analysis ...,
- algorithm 9.1).
- */
- #ifdef BSD
- if ( iflagsave ) {
- iflag = 1;
- pattern = patsave;
- }
- #endif
- return ( popen ( grepcmd, "w" ) );
- }
-
- gosper ( pattern ) /* compute "boyer-moore" delta table */
- char *pattern; /* ... HAKMEM lives ... */
- {
- register int j, patlen;
- /*
- for multibyte characters (e.g. kanji), there are ways.
- extrapolating 256 to 64K may be unwise, in favor of more
- dynamics within boyermoore() itself.
- */
- patlen = strlen ( pattern );
-
- for ( j = 0; j < 256; j++ ) {
- delta0[j] = patlen;
- cmap[j] = (char) j; /* could be done at compile time */
- }
- for ( j = 0; j < patlen - 1; j++ )
- delta0[pattern[j]] = patlen - j - 1;
- delta0[pattern[patlen - 1]] = LARGE;
-
- if ( iflag ) {
- for ( j = 0; j < patlen - 1; j++ )
- if ( islower ( (int) pattern[j] ) )
- delta0[toupper ((int) pattern[j])] = patlen - j - 1;
- if ( islower ( (int) pattern[patlen - 1] ) )
- delta0[toupper ((int) pattern[patlen - 1])] = LARGE;
- for ( j = 'A'; j <= 'Z'; j++ )
- cmap[j] = (char) tolower ( (int) j );
- }
- }
-
- char *
- gotamatch ( file, s ) /* print, count, or stop on full match */
- register char *file, *s;
- {
- nsuccess = 1;
- if ( cflag ) { /* -c overrides -l, for some reason */
- nmatch++;
- do
- ;
- while ( *s++ != NL );
- }
- else if ( lflag ) {
- puts ( file );
- return ( NULL );
- }
- else {
- if ( nfile > 1 )
- printf ( "%s:", file );
- do
- putchar ( *s );
- while ( *s++ != NL );
- }
- return ( (hflag && !cflag) ? NULL : s );
- }
-
-
- char *
- fold ( line )
- char *line;
- {
- static char fline[BUFSIZE];
- register char *s, *t = fline;
-
- for ( s = line; *s != EOS; s++ )
- *t++ = (isupper ((int) *s) ? (char) tolower ((int) *s) : *s );
- *t = EOS;
- return ( fline );
- }
-
- #ifdef BSD
- char *
- sys5fold ( line ) /* a workaround kludge for the old alma mater, e.g. */
- char *line; /* ... "ZIP.*PY" becomes "[zZ][iI][pP].*[pP][yY]" */
- {
- register char *p, *s;
- static char pline[BUFSIZE];
- char c, stencil[5];
- int bdanger = 0;
-
- pline[0] = EOS;
- for ( s = line; *s != EOS; s++ ) {
- if ( *s == '[' )
- bdanger = 1;
- if ( bdanger == 1 && *s == ']' )
- bdanger = 0;
- if ( isalpha ( (int) *s ) ) {
- c = ( (islower ( *s )) ? (char) toupper ( (int) *s ) :
- (char) tolower ( (int) *s ) );
- sprintf ( stencil, ((bdanger) ? "%c%c" : "[%c%c]"), *s, c );
- }
- else {
- stencil[0] = *s;
- stencil[1] = EOS;
- }
- strcat ( pline, stencil );
- }
- *s = EOS;
- return ( pline );
- }
- #endif
-
- oops ( message )
- char *message;
- {
- fprintf ( stderr, "%s\n", message );
- exit ( 2 );
- }
- SHAR_EOF
- if test 11219 -ne "`wc -c < 'egrep.c'`"
- then
- echo shar: error transmitting "'egrep.c'" '(should have been 11219 characters)'
- fi
- fi
- echo shar: extracting "'compat-sys5.c'" '(2645 characters)'
- if test -f 'compat-sys5.c'
- then
- echo shar: will not over-write existing file "'compat-sys5.c'"
- else
- cat << \SHAR_EOF > 'compat-sys5.c'
- /*
- * regcomp(), regexec(), and xread() were posted to mod.sources by
- * henry spencer, and are too large to be included here.
- *
- * This file contains strcspn, strchr, strpbrk, and getopt.
- */
-
- #include <stdio.h>
- /*
- * strcspn - find length of initial segment of s1 consisting entirely
- * of characters not from s2
- */
-
- int strcspn(s1, s2)
- char *s1;
- char *s2;
- {
- register char *scan1;
- register char *scan2;
- register int count;
-
- count = 0;
- for (scan1 = s1; *scan1 != '\0'; scan1++) {
- for (scan2 = s2; *scan2 != '\0';) /* ++ moved down. */
- if (*scan1 == *scan2++)
- return(count);
- count++;
- }
- return(count);
- }
-
- char *strchr(a, b)
- char *a, b;
- {
- char *index();
- return (index(a, b));
- }
-
- /* strpbrk - Returns a pointer to the first character of source that is any
- * of the specified keys, or NULL if none of the keys are present
- * in the source string.
- */
- char *strpbrk(source, keys)
- char *source, *keys;
- {
- register int loc = 0, key_index = 0;
-
- while (source[loc] != '\0') {
- key_index = 0;
- while (keys[key_index] != '\0')
- if (keys[key_index++] == source[loc])
- return((char *) (source + loc));
- loc++;
- }
-
- return(NULL);
- }
-
-
- /*
- * getopt - get option letter from argument vector
- */
- int opterr = 1, /* useless, never set or used */
- optind = 1, /* index into parent argv vector */
- optopt; /* character checked for validity */
- char *optarg; /* argument associated with option */
-
- #define BADCH (int)'?'
- #define EMSG ""
- #define tell(s) fputs(*nargv,stderr);fputs(s,stderr); \
- fputc(optopt,stderr);fputc('\n',stderr);return(BADCH);
-
- getopt(nargc,nargv,ostr)
- int nargc;
- char **nargv,
- *ostr;
- {
- static char *place = EMSG; /* option letter processing */
- register char *oli; /* option letter list index */
- char *index();
-
- if(!*place) { /* update scanning pointer */
- if(optind >= nargc || *(place = nargv[optind]) != '-' || !*++place) return(EOF);
- if (*place == '-') { /* found "--" */
- ++optind;
- return(EOF);
- }
- } /* option letter okay? */
- if ((optopt = (int)*place++) == (int)':' || !(oli = index(ostr,optopt))) {
- if(!*place) ++optind;
- tell(": illegal option -- ");
- }
- if (*++oli != ':') { /* don't need argument */
- optarg = NULL;
- if (!*place) ++optind;
- }
- else { /* need an argument */
- if (*place) optarg = place; /* no white space */
- else if (nargc <= ++optind) { /* no arg */
- place = EMSG;
- tell(": option requires an argument -- ");
- }
- else optarg = nargv[optind]; /* white space */
- place = EMSG;
- ++optind;
- }
- return(optopt); /* dump back option letter */
- }
-
- SHAR_EOF
- if test 2645 -ne "`wc -c < 'compat-sys5.c'`"
- then
- echo shar: error transmitting "'compat-sys5.c'" '(should have been 2645 characters)'
- fi
- fi
- exit 0
- # End of shell archive
-